Language/OS - Multiplatform Resource Library

home *** CD-ROM | disk | FTP | other *** search

/ Language/OS - Multiplatform Resource Library / LANGUAGE OS.iso / dsp / dspgroup / fft32010.arc / README.LIS < prev

Wrap

File List | 1986-01-18 | 8.4 KB | 219 lines

REFERENCE: FFT32010 is a collect of 9 FFT programs written in the TMS32010 source code. These programs and the following writeup can be found in the Chapter 5 of "DFT/FFT and Convolution Algorithms - Theory and Implementation," C.S. Burrus and T.W. Parks, Wiley, 1985. TABLE 5-1. TMS32010 ASSEMBLY LANGUAGE PROGRAM INDEX _____________________________________________________________________ | PROGRAM DESCRIPTION PAGE | |-------------------------------------------------------------------| | 1 Direct DFT with table lookup 148 | | 2 Second-order Goertzel with forward-order input 152 | | 3 Basic one-butterfly Cooley-Tukey radix-2 FFT 156 | | 4 Basic one-butterfly radix-4 FFT 161 | | 5 Three-butterfly radix-4 FFT with reduced storage 168 | | 6 Straight-line 64-point radix-4 FFT 179 | | 7 Two-butterfly radix-8 FFT 190 | | 8 Prime-factor algorithm FFT with unscrambling 205 | | 9 Direct convolution routine 220 | --------------------------------------------------------------------- Chapter 5. TMS32010 ASSEMBLY LANGUAGE PROGRAMS FOR THE DFT AND CONVOLUTION 5.1 INTRODUCTION The TMS32010 assembly language programs in this chapter are based on the FORTRAN routines from Chapter 4. As much as possible, these programs retain the program structure and variable names of their FORTRAN counterparts. Many of the comments in these programs are lines of FORTRAN from the prototype routine which help explain the assembly language. The FORTRAN programs have been followed as much as possible to retain clarity and consistency. At the same time, some assumptions have been made to take advantage of the features of the TMS32010 instruction set. First, many of the programs assume a minimal amount of external memory used to hold the data which is addressed via peripheral I/O instructions, i.e., an address is initially sent to a data address counter and then the data word read from or written to the memory. A counter is used to address the data locations sequentially since complex data is used whose structure consists of the real and imaginary parts of a number in consecutive memory locations. A Q15 format for number representation is assumed for all data and coefficient values. This format (one sign bit + 15 fractional bits; the absolute value of all represented numbers is less than one) simplifies calculations in fixed-point machines, such as the TMS32010 digital signal processor. No scaling on intermediate values in the calculation was done, thus helping to clarify the code. For most data input sequences, scaling is unnecessary since they will not produce accumulator overflow in the processor. Table lookup of coefficients is used for speed and simplicity. These programs use a full size table (often the size of the transform). Smaller tables may be used at the expense of speed if program memory size is limited. All programs have been assembled using a Texas Instruments cross-assembler running on a VAX 11-780 with the VMS operating system. 5.2 TMS32010 PROGRAMS Program 1 is a direct DFT similar to the first FORTRAN program. All data is initially in on-chip data RAM, and results are output sequentially. The data is on page 0 and is addressed indirectly with the auxiliary registers. All other variables are located on page 1 of the TMS32010. Full 32-bit intermediate values are retained for maximum accuracy. Note that the 32-bit running sum is initialized Page 2 by shifting the first data point 15 bits which simulates a multiply by one in Q15 format. In this and all the other assembly language programs, arrays and loop counters start at zero, unlike the FORTRAN routines. Program 2 is a TMS32010 implementation of a second-order Goertzel's algorithm with forward-order inputs similar to FORTRAN Program 4. Like the DFT program, input data is initially on page 0 of data RAM and is addressed indirectly. Results are output sequentially. The 2*cosine factor in the inner loop is implemented by adding twice the product of a single cosine multiply. This 2*cosine term along with the feedback loop make this algorithm very susceptible to quantization error and overflows on a fixed-point machine. Since no scaling is done in this algorithm, the data should be scaled previously to guard against overflow. A single butterfly radix-2 Cooley-Tukey FFT, similar to FORTRAN Program 6, is shown in Program 3. All data is in external data memory using hardware described in the introduction of this chapter. Because the real and imaginary parts of the complex input data are in sequential locations, not in separate arrays, the data index I has a value twice that in the corresponding FORTRAN routine. The variable IE, the index increment for the coefficients, is computed with multiplies by 2 (implemented with a shift) in order to eliminate divisions which are inefficient. The size of the transform is limited to a power of 2, and the largest transform is simply a function of the amount of available program memory. Program 4 is just like Program 3 but with a single radix-4 butterfly. The transform size is limited to a power of 4. Program 5 is a radix-4 algorithm with three butterflies, similar to FORTRAN Program 13. Both auxiliary registers are used as loop counters in the program. Auxiliary Register 1 (AR1) replaces K in the DO 10 loop, and Auxiliary Register 0 (AR0) is used in the DO 1 and DO 20 loops. These registers are efficient as counters in code with a lot of loops. Program 6 is a 64-point complex FFT optimized for speed. The algorithm is a three-butterfly radix-4 with all straight-line code. This program uses the macro capability of the assembler to generate the code, which is almost 2800 lines long when expanded. Three macros are used, one for each type of butterfly. All data begins and ends on page 0 of data RAM, and page 1 contains two temporary locations which are addressed by auxiliary registers. The 13-bit immediate coefficients are used as part of the MPYK (multilpy immediate) instruction. Butterflies are accessed in the same order as in Program 5, and all data points are addressed directly for speed. Page 3 A radix-8 FFT is shown in Program 7. This is a two-butterfly FFT and corresponds to Program 16 in FORTRAN. Because this a radix-8 program, there are half as many external data accesses as for the looped radix-4. This helps to speed up the program. The tradeoff is in a more complex program and fewer transform sizes which can be computed. Program 8 is a prime-factor FFT, corresponding to FORTRAN Program 17. This program is specific for a length-63 PFA, but the structure can be used for all module and transform sizes. Coefficients are in a table initially and then read into data RAM for use during execution. Because the unscrambling is not an in-place algorithm, the final unscrambled results are written to port 2, which corresponds to arrays A and B in the FORTRAN program. A direct calculation of linear convolution is shown in Program 9. In this example, an infinite input sequence is convolved with a 32-bit permanent sequence which represents the impulse response of a bandpass filter. Input is read sequentially from port 0, and output is written to port 1. The main computation in the convolution is done using the pair of instructions, LTD and MPY, which multiplies the sequences point by point and shifts the input sequence for the next loop. REFERENCE 1. Texas Instruments, TMS32010 USER'S GUIDE. Houston, TX: Texas Instruments, 1983.